Multi-field Categorical Data

نویسندگان

  • Ying Wen
  • Jun Wang
  • Tianyao Chen
  • Weinan Zhang
چکیده

This paper presents a method of learning distributed representation for multi-field categorical data, which is a common data format with various applications such as recommender systems, social link prediction, and computational advertising. The success of non-linear models, e.g., factorisation machines, boosted trees, has proved the potential of exploring the interactions among inter-field categories. Inspired by Word2Vec, the distributed representation for natural language, we propose Cat2Vec (categories to vectors) model. In Cat2Vec, a low-dimensional continuous vector is automatically learned for each category in each field. The interactions among inter-field categories are further explored by different neural gates and the most informative ones are selected by pooling layers. In our experiments, with the exploration of the interactions between pairwise categories over layers, the model attains great improvement over state-of-the-art models in a supervised learning task, e.g., click prediction, while capturing the most significant interactions from the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Categorical fracture orientation modeling: applied to an Iranian oil field

Fracture orientation is a prominent factor in determining the reservoir fluid flow direction in a formation because fractures are the major paths through which fluid flow occurs. Hence, a true modeling of orientation leads to a reliable prediction of fluid flow. Traditionally, various distributions are used for orientation modeling in fracture networks. Although they offer a fairly suitable est...

متن کامل

Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction

Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features thatwe usually found in the image and audio domains, the input features in web space are always of multi-field and aremostly discrete and categorical while their dependen...

متن کامل

Implementing SASL using Categorical Multi-combinators

Categorical multi-combinators form a rewriting system developed with the aim of providing efficient implementations of lazy functional languages. The core of the system of categorical multi-combinators consists of only two rewriting laws with a very low pattern-matching complexity. This system allows the equivalent of several -reductions to be performed at once, and avoids the generation of tri...

متن کامل

On Multi-dimensional Markov Chain Models

Markov chain models are commonly used to model categorical data sequences. In this paper, we propose a multi-dimensional Markov chain model for modeling high dimensional categorical data sequences. In particular, the models are practical when there are limited data available. We then test the model with some practical sales demand data. Numerical results indicate the proposed model when compare...

متن کامل

Random Ordinality Ensembles A Novel Ensemble Method for Multi-valued Categorical Data

Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016